What is machine learning, and what can it tell us?

Machine Learning and Statistics

Is there a difference ?

What isn’t different - Tasks

  • Machine Learning
    • Supervised Learning
    • Classification
    • Unsupervised Learning
    • Density Estimation
  • In statistics we call them:
    • Regression
    • Binomial Regression
    • ???
    • Density Estimation

What isn’t (really) different - Tasks

What isn’t different - Techniques

  • Techniques:
    • Random Forests
    • Elastic Nets
    • Neural Networks
  • Statistics cover the same topics
    • Elements of Statistical Learning
    • Advanced Data Analysis
  • Machine Learning courses often start with:
    • Linear Regressions
    • Logistic Regressions

What isn’t (really) different - Techniques (2)

What isn’t (really) different - Models

What is different – Approach

  • Objective:
    • Inference
    • Prediction
  • Target:
    • Estimate Functional Form
    • Minimize Cross-Validation Error
  • Why?
    • If you have large amount of data, you can let it guide the shape of your regression
    • Try it out with a 10th power regression!

Cross-Validation

Freedom (from p-values) isn’t free

Machine Learning, the hype versus reality

Econometric way of dealing with data

  • Economist tell you that \(Y \sim X \beta\)
  • Gather data on \(Y\) and \(X\)
  • Run a regression on them
  • Check \(p\) values
  • if \(p<=.05\):
    • publish a paper

Machine Learning way of dealing with data - tired

  • Domain expert tells you \(X\) affects \(Y\)
  • Run non-parametric regression \(Y \sim X\)
  • Find \(f(X)\) that best predict \(Y\)
  • Good for:
    • Finding deep correlations
    • Use them
  • Not good for:
    • settling scores amongst economists
  • How’s it different from just throwing a lot of \(X\) at a linear regression?

Machine Learning way of dealing with data - wired

  • Start with a very basic set of features \(X\) that may explain \(Y\) once combined
  • Gather lots of data
  • Find \(f(X)\) that best predict \(Y\)
  • If you have truly large sets of data you may be able to avoid needing a domain expert at all

Digits Example

Feature Representation

Regression and Pacman (tuning)

Regression and Pacman (tuning)

  • Regression
    • Come up with some features \(X\)
    • Collect games data from people playing
    • (logistic) regression:
      \[ \text{Action} \sim f(X) \]
  • Application
    • Computer pacman follows regression blindly

Limits to regression alone (cases not observed)

Limits to regression alone (features)

Simple Solution

Deep Blue

  • 8000 features
  • Hand tuned

Self Play

  • Start with basic features
  • Use complicated neural network to produce: \[ \text{Probability of Winning} \sim f(X) \]
  • Make computer use that \(f\) to play against itself
  • Produce more data, feed back into neural network

Alpha Go

  • In a simplified format
    • Start with 10 basic features of go
    • Regress $ f(X)$ on database of human games
    • Self play improvement
    • Beat humans

Alpha Zero

  • In a simplified format:
    • like above but with no starting features

It’s really mostly about features

  • Neural network does two things:
    • Creates complex features out of base input features
    • Combines them non-linearly
  • Only the first one really matters

Summary

  • Two ways to generate “features”
    • Expert Knowledge
    • Fancy Neural Networks
  • Which one is better?
    • Domain Knowledge vs Architectural Knowledge

Which one is fake?

Wrong way to apply machine learning

Overfitting

Data Quality

Black Box Cargo Cult

[Estimating the value of clean air], however, is a methodological rather than a conceptual challenge, and sophisticated statistical techniques have been developed to isolate the effect of air quality from other (potentially correlated) factors.

Extrapolation

Operationalization

  • Dell & Querubin (2018)’s Vietnam:
    • Airforce
    • Bayesian Predictor
    • Score from 0 to 5
    • Rounding

Feedback Loops

  • Build statistical model:
    • $ () f(X)
    • Use old data
  • Use statistical model to target sales
    • Focus on advertising when \(\text{Probability}\) is high
  • Use new data to improve statistical model
  • Exploration vs Exploitation